2.1 Summary and EDA Report

Our exploratory data analysis (EDA) of comments and posts pertaining to Dogecoin has yielded significant insights into the temporal frequency of these posts, patterns of user/account activity, and content indicative of buy/sell signals. This analysis reveals a pronounced correlation between the volume of Dogecoin-related posts and price fluctuations throughout 2022, particularly when aggregating monthly counts and prices.

Additionally, a notable divergence in discussions about Dogecoin was observed between the r/Cryptocurrency and r/Dogecoin forums. Specifically, as the value of Dogecoin declined in June 2022, the volume of posts and comments within the Dogecoin-specific channel decreased, whereas activity related to Dogecoin on the broader r/Cryptocurrency forum increased. This shift prompts further investigation into whether such differences may reflect variations in the investment portfolios or mentalities of distinct user cohorts.

Moreover, a detailed examination of user behavior on an individual level—analyzing the times of day and months when users are most active—revealed a peak in posting activity at 16:00 local time. Additionally, the distribution of active users over the months demonstrated a pattern mirroring price movements, suggesting a potential alignment between user engagement and market trends.

Finally, our analysis extended to the investigation of content related to investment strategies, specifically identifying linguistic cues indicative of buy or sell recommendations. It was found that, generally, the prevalence of buy recommendations exceeds that of sell recommendations. This trend aligns with the community’s composition—predominantly enthusiasts and believers in cryptocurrency—where advocating for purchases can be considered strategically sound within a bullish market context, regardless of whether the intent is to sell or hold.

2.2 Data source and cleaning

2.2.1 Data Cleaning

Cleaning 

Before initiating the Exploratory Data Analysis (EDA) phase, it was imperative to confirm that our dataset was clean and structured appropriately. The steps followed are listed as below:

  • Filter to required subreddits - r/Dogecoin and r/Cryptocurrency

    • Get all posts from r/Dogecoin.

    • Get only posts from r/Cryptocurrency which contain ‘doge’ or ‘dogecoin’.

  • Remove posts with missing values or ‘[deleted]’.

  •  Convert dates from Unix format to a YYYY-mm-dd-hh format (this is needed for time-specific analyses)

Merging

We merged the submissions and comments datasets, based on the post ID, which is stored as ID in the submissions dataset, and link_id in the comments dataset. The entire data cleaning process is documented in the ‘project_eda_cleaning.ipynb’ notebook. After merging, the characteristics of the dataset are listed below.

Summary

The dataset has 587,972 rows and 19 columns. The majority of posts and comments are from r/dogecoin (487037) and the rest are from r/cryptocurrencies (100935)

Variable list

The schema and the variable types are listed below.

subreddit: string (nullable = true)

subreddit_id: string (nullable = true)

id: string (nullable = true)

created_utc: long (nullable = true)

author: string (nullable = true)

is_self: boolean (nullable = true)

num_comments: long (nullable = true)

score: long (nullable = true)

selftext: string (nullable = true)

title: string (nullable = true)

com_subreddit: string (nullable = true)

com_subreddit_id: string (nullable = true)

com_id: string (nullable = true)

com_created_utc: long (nullable = true)

com_author: string (nullable = true)

com_link_id: string (nullable = true)

com_score: long (nullable = true)

com_body: string (nullable = true)

com_submis_id: string (nullable = true)

Generate New Variables:

We created multiple new variables to use in the analysis, as described below.

  • Buy signals (buy_sig): If either the post or any of its comments contains any of these keywords: buy|bought|moon|hold|call|bull|like|yolo

  • Contains ‘doge| dogecoin’: If a post/comment mentions the word ‘doge’

  • Post activity per minute (hour): The average number of comments made on a post per minuter (hour). Divide the total number of comments by the duration between the timestamp when the post was created and the timestamp of the last comment on the post.

  • Day, month and hour: As described above Convert utc_time to yyyy-mm-dd-hh

  • Percentage of post of r/dogecoin (pct_post_rdoge): Proportion of post in different subreddits

2.2.2 Price Query

Dogecoin vs. Bitcoin Price

The graph shows the price fluctuation of bitcoin and dogecoin in 2022
The analysis reveals that both Bitcoin and Dogecoin experienced a decline in value in 2023, coinciding with the broader transition from a bullish to a bearish market within the cryptocurrency domain. Notably, Dogecoin exhibited greater volatility compared to Bitcoin. This heightened fluctuation can be attributed to Dogecoin’s valuation being significantly influenced by community sentiment rather than intrinsic economic factors. A particularly intriguing observation was Dogecoin’s price surge during the FTX crisis, suggesting potential responsiveness to specific market events.

Dogecoin vs. Bitcoin Growth Rate

To quantify the observed trends, we computed the growth rate based on periodic differences. This calculation reinforces the preliminary findings, highlighting Dogecoin’s pronounced susceptibility to fluctuations in response to market events, such as regulatory changes or major scandals. The comparative analysis underscores the distinct behavioral patterns of Bitcoin and Dogecoin within the same market conditions, offering valuable insights into the dynamics of cryptocurrency markets. This research contributes to the academic discourse by elucidating the factors driving volatility in digital currencies, with a particular focus on the influence of community engagement and external events on market behavior.